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DETAILED ACTION 

Response to Amendment 

1 . The preliminary amendment filed on 9/26/05 has been entered in the case file. 



Specification 

Content of Specification (irrelevant sections omitted) 

(f) Background of the Invention : See MPEP § 608.01(c). The specification 
should set forth the Background of the Invention in two parts: 

(1 ) Field of the Invention : A statement of the field of art to which the 
invention pertains. This statement may include a paraphrasing of 
the applicable U.S. patent classification definitions of the subject 
matter of the claimed invention. This item may also be titled 
"Technical Field." 

(2) Description of the Related Art including information disclosed under 
37 CFR 1 .97 and 37 CFR 1 .98 : A description of the related art 
known to the applicant and including, if applicable, references to 
specific related art and problems involved in the prior art which are 
solved by the applicant's invention. This item may also be titled 
"Background Art." 

(g) Brief Summary of the Invention : See MPEP § 608.01 (d). A brief summary 
or general statement of the invention as set forth in 37 CFR 1 .73. The 
summary is separate and distinct from the abstract and is directed toward 
the invention rather than the disclosure as a whole. The summary may 
point out the advantages of the invention or how it solves problems 
previously existent in the prior art (and preferably indicated in the 
Background of the Invention). In chemical cases it should point out in 
general terms the utility of the invention. If possible, the nature and gist of 
the invention or the inventive concept should be set forth. Objects of the 
invention should be treated briefly and only to the extent that they 
contribute to an understanding of the invention. 



(h) 



Brief Description of the Several Views of the Drawing(s) : See MPEP § 



608.01 (f). A reference to and brief description of the drawing(s) as set 
forth in 37 CFR 1.74. 
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(i) Detailed Description of the Invention : See MPEP § 608.01 (g). A 

description of the preferred embodiment(s) of the invention as required in 
37 CFR 1 .71 . The description should be as short and specific as is 
necessary to describe the invention adequately and accurately. Where 
elements or groups of elements, compounds, and processes, which are 
conventional and generally widely known in the field of the invention 
described and their exact nature or type is not necessary for an 
understanding and use of the invention by a person skilled in the art, they 
should not be described in detail. However, where particularly 
complicated subject matter is involved or where the elements, 
compounds, or processes may not be commonly or widely known in the 
field, the specification should refer to another patent or readily available 
publication which adequately describes the subject matter. 

2. The disclosure is objected to because of the following informalities: First, the 

specification has no headings to indicate the numerous sections of the specification. 

Additionally, there are no separate sections specifically directed towards the 

background of the invention or the summary of the invention. Finally, there is no brief 

description of the several views of the drawing(s). 

Appropriate correction is required. 



Claim Rejections - 35 USC §112 

3. The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

4. Claims 7, 8 and 14 are rejected under 35 U.S.C. 112, second paragraph, as 
being indefinite for failing to particularly point out and distinctly claim the subject matter 
which applicant regards as the invention. 

Claims 7 and 8 recite "the associated stored form," however, it is unclear if the 
claims are referring to the stored form on the user terminal or the stored form on the 
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server. For the purposes of examination it will be assumed the claims refer to the 
stored form on the user terminal. 

As per claim 14, the claim is directed to "a server," however, several of the 
means of the server recites transmitting a signal to the server. It is unclear how the 
server is transmitting signals to itself. It seems that the newly amended subject matter 
is the same means recited for the user terminal claimed in claim 9. Because claim 14 is 
directed to a server, these means will be ignored. 



Claim Rejections - 35 USC § 102 

5. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

(e) the invention was described in (1 ) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

6. Claims 1 , 4 and 6-8 are rejected under 35 U.S.C. 1 02(b) as being anticipated by 
Baker (U.S. Pat. 6,122,613), cited by the Applicant. 

As per claim 1, Baker teaches a distributed speech recognition method, 
comprising at least one user terminal and at least one server, capable of communicating 
with one another via a telecommunications network, wherein, at the user terminal, at 
least the following steps are performed: 
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- obtain an audio signal to be recognized (Fig. 3, element 301); 

- calculate modeling parameters for the audio signal to be recognized; and - 
attempt to associate a stored form with the modeling parameters (real-time recognizer, 
Fig. 3, element 303) ; and 

- independently of the step for attempting to associate a stored form, transmit a 
signal indicating the audio signal to be recognized to the server (offline recognizer, col. 
8, lines 12-18 and Fig. 3, element 309); and 

wherein, at the server, at least the following steps are performed: 

- receive the signal transmitted by the user terminal; - attempt to associate a 
stored form with the received signal (independent speech recognition, col. 8, lines 12- 
18). 

7. As per claim 4, Baker teaches wherein the transmitted signal is the original audio 
signal (col. 8, lines 12-18). 

8. As per claim 6, Baker teaches wherein the associated stored form determined at 
the terminal is chosen, when the associated form exists (speech recognition only 
returns existing models, col. 7, lines 43-49). 

9. As per claim 7, Baker teaches wherein the associated stored form determined 
the quickest is chosen (real-time recognizer, col. 7, lines 43-49). 

1 0. As per claim 8, Baker teaches wherein the associated form judged best 
according to a defined criterion is chosen (speech recognition chooses the most 
probable result, col. 7, lines 43-49). 
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1 1 . Claims 14-19 are rejected under 35 U.S.C. 102(e) as being anticipated by 
Reding et al. (U.S. Pat. 6,823,306). 

As per claim 14, Reding teaches a server adapted for cooperating with a user 
terminal comprising: 

means for receiving a signal coming from a user terminal and selected at said 
terminal; and recognition means for associating at least one stored form with modeling 
parameters at the input (speech processing facility receives speech and performs 
speech recognition operations thereon, col. 6, line 58 to col. 7, line 4). 

12. As per claim 15, Reding teaches means for calculating modeling parameters for 
an input signal; control means for controlling the calculation means and the recognition 
means such that: 

• when the signal received by the reception means is of the audio type, the 
parameter calculation means are activated by addressing the selected signal to them as 
input signal, and the parameters calculated by the calculation means are addressed to 
the recognition means as input parameters (if speech is received then feature extraction 
is performed, col. 13, line 63 to col. 14, line 4); and 

• when the selected signal received by the reception means indicates modeling 
parameters, said indicated parameters are addressed to the recognition means as input 
parameters (speech recognition of extracted features, col. 14, lines 5-12). 

1 3. As per claim 1 6, Reding teaches means for detecting activity in order to produce 
the signal to be recognized in the form of speech segments extracted from an original 
audio signal outside of periods without voice activity and in which the control means are 
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designed to control the parameter calculation means (monitors for input speech, col. 11, 
lines 27-33) and the recognition means when the received signal is of the audio type 
such that: 

if the received signal of the audio type is in the form of speech segments after 
voice activation detection, the parameter calculation means are activated by addressing 
the received signal to them as input signal, then the parameters calculated by the 
parameter calculation means are addressed to the recognition means as input 
parameters; otherwise, the server voice activation detection means are activated by 
addressing the received signal to them as input signal, then the segments extracted by 
the voice activation detection means are addressed to the parameter calculation means 
as input parameters, then the parameters calculated by the parameter calculation 
means are addressed to the recognition means as input parameters (features of the 
detected speech are extracted at the server, col. 13, line 63 to col. 14, line 4). 

14. As per claim 17, Reding teaches means for downloading voice recognition 
software resources via the telecommunications network onto a terminal (model training 
performed on server and updated to user terminal, Fig. 7). 

15. As per claim 18, Reding teaches wherein said resources comprise at least one 
module from amongst: a VAD module, a module for calculating modeling parameters for 
an audio signal and a recognition module for associating at least one stored form with 
modeling parameters (feature extractor and speech recognition, Fig. 10). 

16. As per claim 1 9, Reding teaches means for determining the stored form to be 
chosen between the stored forms determined at the terminal and at the server, 
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respectively (determines if to perform speech recognition at terminal or server, col. 11, 
lines 34-48). 

Claim Rejections - 35 USC § 103 

1 7. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

18. Claims 2, 3 and 5 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Baker in view of Reding. 

As per claim 2, Baker does not teach wherein the signal transmitted by the user 
terminal to the server is selected from at least the audio signal to be recognized and a 
signal indicating the modeling parameters; wherein, if the received signal is of the audio 
type, the server calculates modeling parameters for the received audio signal and 
attempts to associate a stored form with the modeling parameters of the received audio 
signal; and wherein, if the received signal indicates modeling parameters, the server 
attempts to associate a stored form with said modeling parameters. 

Reding teaches transmitting either the speech signal or a feature vector and if 
the speech signal is transmitted the features are calculated at the server prior to speech 
recognition (col. 12, lines 47-59 and col. 13, line 63 to col. 14, line 12). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Baker wherein the signal transmitted by the user terminal to the 
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server is selected from at least the audio signal to be recognized and a signal indicating 
the modeling parameters; wherein, if the received signal is of the audio type, the server 
calculates modeling parameters for the received audio signal and attempts to associate 
a stored form with the modeling parameters of the received audio signal; and wherein, if 
the received signal indicates modeling parameters, the server attempts to associate a 
stored form with said modeling parameters as taught by Reding because in distributed 
speech recognition there are a finite number of predictable methods such as performing 
all the processing on the terminal, performing all the processing at the server, 
performing all the processing at both the terminal and server or splitting the processing 
between the terminal and the server. Therefore, it would have been obvious to try 
Reding's method of splitting processing in the Baker process. 
1 9. As per claim 3, Baker does not teach wherein obtaining the signal to be 
recognized at the terminal comprises a VAD in order to produce the audio signal to be 
recognized in the form of speech segments extracted from an original audio signal 
outside periods without voice activity. 

Reding teaches monitoring the input and only performing speech processing 
when speech is detected (col. 11, lines 27-33). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Baker to include a VAD as taught by Reding because this is a 
known technique to improve a similar device in the same way. Specifically, the VAD 
would prevent unneeded processing when there is no speech input in the system. 
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20. As per claim 5, Baker does not teach if the received signal of the audio type is in 
the form of speech segments after voice activation detection, the parameter calculation 
means are activated by addressing the received signal to them as input signal, then the 
parameters calculated by the parameter calculation means are addressed to the 
recognition means as input parameters; otherwise, the server voice activation detection 
means are activated by addressing the received signal to them as input signal, then the 
segments extracted by the voice activation detection means are addressed to the 
parameter calculation means as input parameters, then the parameters calculated by 
the parameter calculation means are addressed to the recognition means as input 
parameters. 

Reding teaches if the received signal of the audio type is in the form of speech 
segments after voice activation detection, the parameter calculation means are 
activated by addressing the received signal to them as input signal, then the parameters 
calculated by the parameter calculation means are addressed to the recognition means 
as input parameters; otherwise, the server voice activation detection means are 
activated by addressing the received signal to them as input signal, then the segments 
extracted by the voice activation detection means are addressed to the parameter 
calculation means as input parameters, then the parameters calculated by the 
parameter calculation means are addressed to the recognition means as input 
parameters (features of the detected speech are extracted at the server, col. 13, line 63 
to col. 14, line 4). 
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It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Baker so that if the received signal of the audio type is in the form of 
speech segments after voice activation detection, the parameter calculation means are 
activated by addressing the received signal to them as input signal, then the parameters 
calculated by the parameter calculation means are addressed to the recognition means 
as input parameters; otherwise, the server voice activation detection means are 
activated by addressing the received signal to them as input signal, then the segments 
extracted by the voice activation detection means are addressed to the parameter 
calculation means as input parameters, then the parameters calculated by the 
parameter calculation means are addressed to the recognition means as input 
parameters as taught by Reding because it would ensure that only speech segments 
are processed rather than noise. 

21 . Claims 9-1 3 are rejected under 35 U.S.C. 1 03(a) as being unpatentable over 
Reding in view of Baker. 

As per claim 9, Reding teaches a user terminal adapted for cooperating with a 
server, for comprising: 

- means for obtaining an audio signal to be recognized; - means for calculating 
modeling parameters for the audio signal (feature extractor processes audio input, col. 
8, lines 31-39); and 

- control means for selecting a signal to be transmitted to the server from 
between the audio signal to be recognized and a signal indicating the calculated 
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modeling parameters (if feature extraction is supported the features are transmitted 
otherwise its the speech signal, col. 1 1 , lines 34-48 and col. 12, lines 47-59); 

-recognition means for associating at least one stored form with modeling 
parameters calculated by the calculation means (local recognition, col. 11, lines 34-48); 

-means for transmitting a signal indicating the audio signal to be recognized to 
the server (col. 12, lines 47-59). 

Reding does not teach the transmitting is independent from recognition means. 

Baker teaches a system of distributed speech recognition where the speech 
transmitted to the server is done independently of the local recognition means. 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Reding to transmit the speech independent of the recognition means 
as taught by Baker because it would increase recognition reliability to have the 
recognition performed by multiple speech recognizers. 

22. As per claim 1 0, Reding teaches wherein the means for obtaining the audio 
signal to be recognized comprise means for detecting voice activity in order to produce 
the signal to be recognized in the form of speech segments extracted from an original 
audio signal, outside of periods without voice activity (col. 1 1 , lines 27-33). 

23. As per claim 1 1 , Reding teaches wherein the control means are designed to 
select at least one signal to be transmitted to the server from amongst the original audio 
signal, the audio signal to be recognized in the form of the speech segments extracted 
by the voice activation detection means and the signal indicating the calculated 
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modeling parameters (if feature extraction is supported the features are transmitted 
otherwise its the speech signal, col. 11, lines 34-48 and col. 12, lines 47-59). 

24. As per claim 1 2, Reding teaches wherein at least one part of the parameter 
calculation means and of the recognition means is downloaded from the server (model 
training performed on server and updated to user terminal, Fig. 7). 

25. As per claim 1 3, Reding teaches means for determining the stored form to be 
chosen between the stored formed determined at the terminal and the server, 
respectively (determines if to perform speech recognition at terminal or server, col. 11, 
lines 34-48). 



Conclusion 

26. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Anastasakos et al. (U.S. Pat. 7,1 97,331 ) and Balasuriya (U.S. 
Pat. 6,898,567) teach methods of distributed speech recognition which uses both local 
and remote speech recognizers. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to MATTHEW J. SKED whose telephone number is 
(571)272-7627. The examiner can normally be reached on Mon-Fri (8:00 am - 4:30 
pm). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on (571) 272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



/Matthew J Sked/ 
Examiner, Art Unit 2626 



